Better Guarantees for Sparsest Cut Clustering

نویسنده

Maria-Florina Balcan

چکیده

The field of approximation algorithms for clustering is a very active one and a large number of algorithms have been developed for clustering objectives such as k-median, min-sum, and sparsest cut clustering. For most of these objectives, the approximation guarantees do not match the known hardness results, and much effort is spent on obtaining tighter approximation guarantees [1, 4, 5, 8, 6, 9, 10]. However, for many practical clustering problems such as clustering proteins by function, or clustering images by subject, there is some unknown correct “target” clustering; in such cases the pairwise information is merely based on some heuristics and the real goal is to achieve low error on the data. In these settings, the implicit hope is that approximately optimizing objective functions such as those mentioned above will in fact produce a clustering of low error, i.e., a clustering that is close pointwise to the truth. Formally, for a set of n data points the error of a clustering the error of a clustering C′ = {C ′ 1, ..., C ′ k} with respect to target clustering C = {C1, ..., Ck} is the fraction of points on which C and C′ disagree under the optimal matching of clusters in C to clusters in C′, i.e.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximate Hierarchical Clustering via Sparsest Cut and Spreading Metrics

Dasgupta recently introduced a cost function for the hierarchical clustering of a set of points given pairwise similarities between them. He showed that this function is NP -hard to optimize, but a top-down recursive partitioning heuristic based on an αn-approximation algorithm for uniform sparsest cut gives an approximation of O(αn logn) (the current best algorithm has αn = O( √ log n)). We sh...

متن کامل

Unifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering

We present and analyze a new framework for graph clustering based on a specially weighted version of correlation clustering, that unifies several existing objectives and satisfies a number of attractive theoretical properties. Our framework, which we call LambdaCC, relies on a single resolution parameter λ, which implicitly controls both the edge density and sparsest cut of all output clusters....

متن کامل

n)-Approximation Algorithm For Directed Sparsest Cut

We give an O( √ n)-approximation algorithm for the Sparsest Cut Problem on directed graphs. A näıve reduction from Sparsest Cut to Minimum Multicut would only give an approximation ratio of O( √ n log D), where D is the sum of the demands. We obtain the improvement using a novel LP-rounding method for fractional Sparsest Cut, the dual of Maximum Concurrent Flow.

متن کامل

Hierarchical Clustering via Spreading Metrics

We study the cost function for hierarchical clusterings introduced by [Dasgupta, 2016] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat clusters. It was also shown in [Dasgupta, 2016] that a top-down algorithm returns a hierarchical clustering of cost at most O (αn log n) times the cost of the optimal hierarchical clustering, where ...

متن کامل

Embedding approximately low-dimensional $\ell_2^2$ metrics into $\ell_1$

Goemans showed that any n points x1, . . . xn in d-dimensions satisfying l 2 2 triangle inequalities can be embedded into l1, with worst-case distortion at most √ d. We extend this to the case when the points are approximately low-dimensional, albeit with average distortion guarantees. More precisely, we give an l2-to-l1 embedding with average distortion at most the stable rank, sr (M), of the ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Better Guarantees for Sparsest Cut Clustering

نویسنده

چکیده

منابع مشابه

Approximate Hierarchical Clustering via Sparsest Cut and Spreading Metrics

Unifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering

n)-Approximation Algorithm For Directed Sparsest Cut

Hierarchical Clustering via Spreading Metrics

Embedding approximately low-dimensional $\ell_2^2$ metrics into $\ell_1$

عنوان ژورنال:

اشتراک گذاری